Quantcast
Channel: Lecciones Prácticas
Viewing all articles
Browse latest Browse all 33

CDS Invenio: batch delete records or interval of records (from python interpreter)

$
0
0

Sometime ago I came up with this little hack to add invenio the functionality to delete a record from command line.

If you need to delete a lot of records (i.e. in your testing/development server), you can add this other hack to bibeditcli.py:

Delete several records from invenio: the dirty way

This works, but is not necesarily the way to go. There is another way to achieve same result (records deleted) but does not over load Bibsched with a task for each record. We’ll go over that one later, though.

First thing first: lets go the dirrrrrty way:

def cli_delete_interval(recid_inicio, recid_fin):
    """
    Delete records from recid_inicio to recid_fin, both included
    You'd better make sure...
    """
    try:
        recid_inicio = int(recid_inicio)
    except ValueError:
        print "ERROR: First Record ID must be integer, not %s:" %recid_inicio
        sys.exit(1)
    try:
        recid_fin = int(recid_fin)
    except ValueError:
        print "ERROR: End record ID must be integer, not %s." %recid_fin
        sys.exit(1)
 
    if recid_inicio > recid_fin:
        print "ERROR: First record ID must be less than last record ID."
        sys.exit(1)
 
    for recid in range(recid_inicio, recid_fin):
        (record, junk) = get_record(CFG_SITE_LANG, recid, 0, "false")
        add_field(recid, 0, record, "980", "", "", "c", "DELETED")
        save_temp_record(record, 0, "%s.tmp" % get_file_path(recid))
        save_xml_record(recid)

This is how you call this new function from python.
First, navigate to $PATH_TO_INVENIO/lib/python and run your python interpreter

[miguel@mydevinvenioinstance ~]# cd /soft/cds-invenio/lib/
[miguel@mydevinvenioinstance lib]# python
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

And then, just…

>>> import invenio
>>> from invenio.bibeditcli import cli_delete_interval
>>> # the following line will delete records from ID=5125 to ID=7899 .... 
>>> # BE CAREFUL! GREAT POWER COMES WITH GREAT RESPONSIBILITY
>>> 
>>> cli_delete_interval(5125,7899)

Delete several records from Invenio: the not-so-dirty way

If you take a look at the new cli_delete_interval we just came up with, or run it over a big interval, a whole lot of new tmp files will be generated and a lot of tasks will be sent to bibsched (one for every record.). Not efficient. Not nice.

This code is better. Just one tmp file (which will be deleted upon termination) and one single task sent to bibsched.
Please notice the # EDIT HERE! part at line 13

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def cli_delete_interval(recid_inicio, recid_fin):
    """
    By: Miguel Martin 20120130 
    Goal:
      Delete records from recid_inicio to recid_fin, both included
      Creates just a tmp file and a task (just one) is sent to bibsched
    """
 
    from invenio.bibrecord import record_xml_output
    from invenio.bibtask import task_low_level_submission
 
    # EDIT HERE! FILEPATH MUST BE READABLE/WRITABLE! ######
    tmpfile = "/home/miguelm/tmp/borrado.xml" 
    # #####################################################
 
    try:
        recid_inicio = int(recid_inicio)
    except ValueError:
        print "ERROR: First Record ID must be integer, not %s:" %recid_inicio
        sys.exit(1)
    try:
        recid_fin = int(recid_fin)
    except ValueError:
        print "ERROR: End record ID must be integer, not %s." %recid_fin
        sys.exit(1)
 
    if recid_inicio > recid_fin:
        print "ERROR: First record ID must be less than last record ID."
        sys.exit(1)
 
    fd = open(tmpfile, "w")
    for recid in range(recid_inicio, recid_fin):
        (record, junk) = get_record(CFG_SITE_LANG, recid, 0, "false")
        add_field(recid, 0, record, "980", "", "", "c", "DELETED")
        fd.write(record_xml_output(record))
 
    fd.close()
    task_low_level_submission('bibupload', 'bibedit', '-P', '5', '-r', '%s' % tmpfile)
    #os.system("rm %s" % tmpfile)

Cheers!


Viewing all articles
Browse latest Browse all 33

Trending Articles